Picture for Difan Zou

Difan Zou

An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models

Add code
Nov 26, 2024
Viaarxiv icon

How Does Critical Batch Size Scale in Pre-training?

Add code
Oct 29, 2024
Viaarxiv icon

Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers

Add code
Oct 24, 2024
Viaarxiv icon

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

Add code
Aug 08, 2024
Viaarxiv icon

Extracting Training Data from Unconditional Diffusion Models

Add code
Jun 18, 2024
Viaarxiv icon

Explainable Bayesian Recurrent Neural Smoother to Capture Global State Evolutionary Correlations

Add code
Jun 17, 2024
Viaarxiv icon

The Implicit Bias of Adam on Separable Data

Add code
Jun 15, 2024
Viaarxiv icon

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

Add code
Jun 04, 2024
Viaarxiv icon

Slight Corruption in Pre-training Data Makes Better Diffusion Models

Add code
May 30, 2024
Viaarxiv icon

A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models

Add code
May 28, 2024
Viaarxiv icon